Use of openair::polarPlot() for source characterisation
1 Bivariate polar plots
1.1 Introduction
Bivariate polar plots are an effective way in which to gain information about sources of air pollution where wind direction and wind speed can reveal information about the character of sources. They show how concentrations vary by wind direction and wind speed, shown in polar coordinates. In addition, they show the concentration as a smoothed surface. Because pollutant concentrations vary in important ways when plotted against wind direction and speed they can provide insights into the character of the emission sources (D. C. Carslaw et al. 2006; David C. Carslaw and Beevers 2013; Uria-Tellaetxe and Carslaw 2014; Grange, Lewis, and Carslaw 2016).
In these plots the radial axis is by default wind speed — changing from zero wind speed in the centre and increasing outwards. The openair manual and associated papers provide much more detail on their purpose and development.
It should be stressed that the radial axis does not have to be wind speed. It could be temperature or a measure of atmospheric stability, or indeed any other continuous variable. The key point is that the variable on the radial axis in some way helps to differentiate between different sources types. A better idea of these ideas is shown in the examples below.
2 Source types
2.1 Area source
Area sources such as large urban areas tend to be characterised by having highest concentrations under low wind speed conditions i.e. stable atmospheres. In the plot below there is a clear ‘bull’s eye’ effect where concentrations are much higher under low wind speed conditions from all directions. In the plot below a large area source (representing an urban area such as London) was modelled using the ADMS model.
It is easy to see whether the modelled behaviour above is replicated by real atmospheric measurements. As an example 1-year of hourly data can be analysed from the North Kensington urban background site in inner London. The data are imported using the importAURN() function.
It is easy to import the data using openair using the importAURN() family of functions. Type ?importAURN to find out more about this (and other) sites. Note that wind speed, wind direction and temperature may also be imported (for data from around 2010). These meteorological data are not based on surface measurements but derived from the WRF regional model.
The plot based on urban measurements below shows the same sort of pattern as the modelled version above, i.e., highest concentrations at low wind speeds.
# import AQ measurements
kc1 <- importAURN(site = "kc1", year = 2003)
# import met data (default site at Heathrow is OK, so just need year)
met <- importNOAA(year = 2003)
# merge data sets
kc1 <- inner_join(kc1, met, by = "date")
# make a plot
polarPlot(kc1, pollutant = "nox", cols = "jet")2.2 Point (chimney stack) source
Point source emissions released at height and with buoyancy (because the plume is hot) can show a very different pattern on polar plots.
In this case a tall (250m) chimney stack was modelled with a receptor 1 km from the source. This type of source is typical of a large power station.
In this case the ADMS model shows a very different pattern of concentration compared to roads or large volume sources. Highest concentrations are observed under high wind speed conditions, which is opposite to the behaviour of a source representing a large urban area of ground-level sources such as road vehicles and domestic combustion.
We can consider these plume effects in more detail to see how a plume can be brought down to ground level under higher wind speed conditions. Figure 3 shows the plume centreline for a tall stack (typical of a large power station) under two conditions: 2 m s-1 and 12 m s-1. The figure clearly shows the lower mean plume height under higher wind speed conditions. In addition, under windy conditions there will be more mechanical turbulence generated that will also aid plume mixing.
Again, it is possible to consider real measurements to see if similar patterns are observed. The North Kensington site is affected by power stations to the east of London, which is most clear for SO2 concentrations…
polarPlot(kc1, pollutant = "so2")Large power stations are not the only source that could results in high concentrations under high wind speed conditions. For other pollutants, other processes can be important. For example, the suspension (or re-suspension) of particulate can occur under high wind speed conditions, as can sea-salt aerosol. Non-local sources e.g. secondary particulate and the long-range transport of dust can also be observed under high wind speed conditions.
2.3 Road or line sources
The pattern for road sources can vary a lot but is often shown as being clearly offset from the axes, as shown in the plot below where the road is to the south. For this particular example the road is assumed to be in open, flat terrain and is not affected by complex dispersion.
The pattern can however be highly complex, if (for example) the road is located in a street canyon. In these conditions, the apparent location of the source can be opposite to that expected because of recirculation of the plume in the canyon. Marylebone Road in central London is a good example of a ‘classical’ street canyon situation. Indeed, the data that comes with openair (mydata) is about 8 years of data from Marylebone Road, so it is easy to plot:
polarPlot(mydata, pollutant = "nox",
col = "jet")The Marylebone Road site is on the south side of the street (see here). However, the highest concentrations are wind the wind is coming from the south i.e. from the site to the road and not the other way around (see Figure here).
The
polarPlot()function is very flexible and by default will use wind speed (ws) on the radial axis.It is possible to consider statistics other than the mean, including median, max and percentile.
It can be useful to explore the conditions that dominate the highest concentrations using
statistic = "cpf"and supplying a percentile value e.g.percentile = 90, which will show the probability that the concentration of a species is above the percentile value.The relationship between two pollutants can be considered by supplying two pollutant names and a statistic to consider e.g.
pollutant = c("pm10", "nox")andstatistic = "r".A related function
polarCluster()provides a way of extracting interesting features for further analysis.
3 Example of isoprene
Isoprene is an interesting example to consider because the source is mostly biogenic in origin — although some isoprene is also emitted by road vehicles. The Harwell site is one of the few in the UK that measures hydrocarbons along with other species. The site itself is in rural Oxfordshire.
Note that in this case that we choose to use measured meteorological data rather than modelled; and the modelled data is removed during the joining of the two data sets.
hc <- importAURN("har", 2013, hc = TRUE)
# import met data from nearby site (Benson)
met <- importNOAA(code = "036580-99999", year = 2013)
# merge AQ and met but don't use modelled ws and wd
hc <- inner_join(
select(hc, -ws, -wd, -air_temp),
met,
by = "date"
)You can take a look at the data by typing glimpse(hc), which shows there is alot of data available.
The polar plot below shows that the highest concentrations of isoprene tend to occur at low wind speeds, especially when the wind direction is from the south.
For ‘fun’ and an excuse to use some R, we’ll also try a different colour scheme using the viridis package (see here for details of the colour schemes). In openair, simplified versions of the viridis colour schemes exist for: “viridis”, “plasma”, “magma”, “inferno” and “cividis”.
polarPlot(hc, pollutant = "isoprene", col = "plasma")However, the radial axis does not have to be wind speed. In this case it is interesting to use temperature (air_temp) for the radial axis. The plot below shows that the highest concentrations of isoprene occur when the temperature is high (and the wind direction is from the south). This behaviour is entire expected from what is known about isoprene emissions, i.e., released at higher ambient temperatures.
Also, to demonstrate how to be more flexible with colours, we can take an openair colour palette and reverse the ordering of the colours for different effects using the openColours() function.
my_cols <- rev(openColours("magma", 10))What do polar plots look like for some interesting sites in for sites in the UK? It can be interesting to contrast different pollutants at the same site, or between sites.
4 Conditional probability functions
Often the interest in knowing which directions and under which conditions the highest concentrations occur for a pollutant. The Conditional Probability Function (CPF) is a useful approach here Uria-Tellaetxe and Carslaw (2014), see open-access paper here. It is usually expressed as the probability that a certain concentration (often a percentile such as the 90th) is exceeded in a given wind sector. In the bivariate (polarPlot) case it is the probability that a concentration is exceeded in a given wind-speed/ wind direction interval. In openair there are two main functions to plot CPFs: percentileRose() and polarPlot().
The percentileRose() function will plot the concentration at one or more percentile values by wind direction. For example, a plot of several percentile concentrations of benzene can be made by
percentileRose(
hc, pollutant = "benzene",
percentile = c(50, 75, 90, 95), cols = "viridis"
)For example, based on the isoprene plot above, we could plot the concentrations in a different way by asking “what is the probability that a concentration exceeds a particular percentile in any wind speed-direction bin?”. This sort of plot can give a clearer indication of the origins of the highest concentration conditions.
polarPlot(
hc,
pollutant = "isoprene",
x = "air_temp", # a different radial axis ...
statistic = "cpf",
percentile = 75,
cols = my_cols
)percentile option.Take some time to explore the use of the CPF in polarPlot with data of local interest to you.
5 Clustering surfaces and investigating clusters
The patterns shown by polar plots often reveal interesting features that warrant further investigation — but how best to identify features? They can be highly influenced by the colour schemes used. One approach is to use cluster analysis applied to the surface to identify areas of common similarities of concentration, wind speed and direction (David C. Carslaw and Beevers 2013).
As an example, we can consider the North Kensington site in London. To make it more interesting, we will focus on conditions where the is a high chance of high concentrations with a percentile value of 90. Try plotting the surface first:
polarPlot(kc1,
pollutant = "so2",
statistic = "cpf",
percentile = 90
)Now we can explore how clustering might partition the surface by considering cluster solutions from 2:5. The plot below shows that a two cluster solution is fine for our purposes and nicely isolates the peak to the east.
polarCluster(kc1,
pollutant = "so2",
statistic = "cpf",
percentile = 90,
n.clusters = 2:5
)Having decided that a two-cluster solution is what we want, we can carry out further analysis. Note that in this case we explicitly choose the colours to use, which we can carry through to subsequent plots.
clust <- polarCluster(kc1,
pollutant = "so2",
statistic = "cpf",
percentile = 90,
n.clusters = 2,
cols = c("tomato", "turquoise4")
)clust is an object that returns various things including the original hourly data and the associated clusters.
We can have a look at the data and read it into a data frame for further uses. We can retrieve the results of the two-cluster solution as follows.
clust_data <- clust$data
# note column called `cluster`
names(clust_data) [1] "site" "code.x" "date" "co" "nox"
[6] "no2" "no" "o3" "so2" "pm10"
[11] "code.y" "station" "latitude" "longitude" "elev"
[16] "ws" "wd" "air_temp" "atmos_pres" "visibility"
[21] "dew_point" "RH" "ceil_hgt" "cl_1" "cl_2"
[26] "cl_3" "cl" "cl_1_height" "cl_2_height" "cl_3_height"
[31] "precip_12" "precip_6" "pwc" "precip" ".id"
[36] "cluster"
Now we can use the data and plot and analyse it by cluster number. What this shows is that cluster 2 (the ‘feature’) has much higher concentrations than cluster 1 (everything else).
group_by(clust_data, cluster) %>%
summarise(
mean = mean(so2, na.rm = TRUE),
max = max(so2, na.rm = TRUE),
n = n()
)# A tibble: 3 × 4
cluster mean max n
<fct> <dbl> <dbl> <int>
1 1 4.24 40 7145
2 2 9.99 80 1494
3 <NA> 3.59 24 121
We can explore the temporal variations of these clusters also.
timeVariation(clust_data,
pollutant = "so2",
group = "cluster",
cols = c("tomato", "turquoise4")
)It is also useful to consider what the time series looks like if we identify the proportion of the daily concentration that is due to each cluster, which is shown in Figure 16.
timeProp(clust_data,
pollutant = "so2",
proportion = "cluster",
avg.time = "day",
cols = c("tomato", "turquoise4")
)Using some of the techniques described above, explore some of the air pollution characteristics at other sites. See if you can explore the results of cluster analysis using other openair plots such as calendarPlot() and timeVariation(). With more years of data it can be very useful to plot trends by cluster — something to try…
6 Polar plots on an interactive map
You have already seen the {openairmaps} package and how it is used for plotting information on the UK air pollution networks. It has also been developed to plot bivariate polar plots on an interactive leaflet map.
library(openairmaps)The function polarMap() requires information on the site, latitude and longitude in addition to the usual information required by the polarPlot() function (e.g. wind speed, ws and direction, wd). The package comes with some example data (polar_data) that can be used as a template for using different data.
Rows: 35,040
Columns: 13
$ date <dttm> 2009-01-01 00:00:00, 2009-01-01 01:00:00, 2009-01-01 02:00…
$ nox <dbl> 113, 40, 48, 36, 40, 50, 50, 53, 80, 111, 206, 113, 86, 82,…
$ no2 <dbl> 46, 32, 36, 29, 32, 36, 34, 34, 50, 59, 67, 61, 52, 53, 52,…
$ pm2.5 <dbl> 42, 45, 43, 37, 36, 33, 33, 31, 27, 28, 37, 30, 27, 29, 27,…
$ pm10 <dbl> 46, 49, 46, NA, 38, 32, 36, 32, 30, 32, 39, 37, 32, 33, 34,…
$ site <chr> "London Bloomsbury", "London Bloomsbury", "London Bloomsbur…
$ lat <dbl> 51.52229, 51.52229, 51.52229, 51.52229, 51.52229, 51.52229,…
$ lon <dbl> -0.125889, -0.125889, -0.125889, -0.125889, -0.125889, -0.1…
$ site_type <chr> "Urban Background", "Urban Background", "Urban Background",…
$ wd <dbl> 58.92536, 74.46675, 30.00000, 45.00000, 70.00000, 46.63627,…
$ ws <dbl> 2.066667, 1.900000, 1.550000, 2.100000, 1.500000, 2.100000,…
$ visibility <dbl> 5000.000, 4933.333, 5000.000, 4900.000, 5000.000, 6000.000,…
$ air_temp <dbl> 0.8666667, 0.8666667, 0.8000000, 0.8500000, 0.8666667, 0.96…
Plot an interactive map.
polarMap(polar_data, pollutant = "no2", popup = "site")
Or with a different base map (there are many!).
polarMap(
polar_data,
pollutant = "no2",
popup = "site",
provider = "Stamen.Toner"
)See if you can plot a selection of sites in this way. You will need to link air quality data, site meta data (latitude and longitude) and meteorological data. It would be interesting for example to consider concentrations of PM2.5, PM10 or a sub-component of PM.
By default, each polar plot is scaled individually. However, if the same scale is required for all plots the limits can be supplied directly e.g. limits = c(0, 50) and the overall scale shown on the map.
7 Polar difference plots
Another use of polar plots is to consider how concentrations of a pollutant change in time. The most common way of understanding changes in concentration is considering different types of trend analysis e.g. smooth trends and nonparametric linear trends.
However, it can be very useful to consider difference in pollutant concentration in time on a polar plot. This is what the polarDiff function does — it takes two data frames and calculates the difference between them as a polar plot.
The example given in the {openair} book highlights a situation at the steelworks at Port Talbot in Wales, which is briefly shown here. (You can use the {openairmaps} networkMap function and network as “waqn” to see where the sites are in Wales).
The first step is to import data using importAURN or import WAQN to obtain the air quality data for 2012 and 2021; example years that we want to contrast. In this case we will use the modelled meteorological data rather than obtaining surface measurements.
port_talbot <- importAURN(site = "pt4", year = c(2012, 2019))
names(port_talbot) [1] "site" "code" "date" "co" "nox" "no2"
[7] "no" "o3" "so2" "pm10" "pm2.5" "v10"
[13] "v2.5" "nv10" "nv2.5" "ws" "wd" "air_temp"
Now we can contrast the two years of data (2012 and 2019) to consider how concentrations of SO2 have changed.
polarDiff(
before = selectByDate(port_talbot, year = 2012),
after = selectByDate(port_talbot, year = 2019),
pollutant = "so2"
)It is useful to consider different sites, pollutants and years think about the nature of the changes shown. It is also possible to contrast two sites rather than two periods from the same site e.g. a background and roadside pair.
8 Relationship between two variables
In a further, more advanced, extension to polar plots, it is possible to consider how two variables are related to one another in a bivariate polar plot Grange, Lewis, and Carslaw (2016). The main idea behind this new approach is that by understanding how two pollutants are related, information can be provided about the nature of the source(s) involved. While it is possible to consider statistics such as the Pearson correlation coefficient, r, for measurement data, much greater insight is possible by additionally considering whether there are higher or lower correlations for particular wind speeds and directions. Consider for example two metals emitted at a steelworks — it may well be that the highest correlation between the two metals occurs for very specific wind speeds and directions due to a particular process used to manufacture steel. Considering the correlation using all data may show evidence that two pollutants are correlated but not reveal they only tend to be correlated under some conditions.
A scatter plot of PM2.5 and PM10 does show a strong relationship with no obvious issues (see plot below). We might conclude that PM2.5 and PM10 are highly correlated under all conditions — but is that true?
scatterPlot(hc, x = "pm10", y = "pm2.5",
fill = "dodgerblue", col = "white", pch = 21,
linear = TRUE)As an example, we can plot the correlation surface between PM2.5 and PM10 for the rural Harwell site. The plot below shows that PM2.5 and PM10 and very highly correlated for easterly winds, which in this case will almost certainly be due to transboundary contributions and a lot of secondary aerosol.
polarPlot(hc, poll = c("pm2.5", "pm10"),
statistic = "r",
col = "jet",
wd_spread = 10, ws_spread = 1)We can explore the relationship further by considering the slope of the regression between PM2.5 and PM10. In this plot there is a clear region where almost all the PM10 is PM2.5 (from about zero to 140 degrees).
polarPlot(hc, poll = c("pm2.5", "pm10"),
statistic = "robust_slope",
wd_spread = 10, ws_spread = 1,
col = "jet")We can usefully identify these two regions by applying cluster analysis of the data as before. Two clusters is sufficient to select the two regions.
clust <- polarCluster(hc, poll = c("pm2.5", "pm10"),
statistic = "robust_slope",
n.clusters = 2,
col = "viridis")Now we have the data split by cluster we can separately apply a regression line through each to see if they really are different. The plot below shows the two cluster regression lines. Compared with the basic PM2.5 vs. PM10 plot shown earlier, it is not obvious at all there are two main regimes contained in the data — the overall regression is very good and it looks like there is only one main effect. However, if we carefully split the data using robust regression applied to the polar plot, followed by cluster analysis we can do a much better job of revealing the two different behaviours.
As shown in the plot below, cluster two has a slope of 0.98 i.e. nearly all the PM2.5 is PM10. However, cluster 1 has a slope of 0.49 i.e. half of cluster 2. What this analysis shows is how it is possible to reveal hidden behaviours in the data that would otherwise be missed.
What this approach demonstrates is how it is possible to identify and extract different behaviours in the data. For a fuller analysis, it might be useful to use the clusters identified and then link the results with other data, e.g., sulphate and nitrate data to explore the composition in more detail.
scatterPlot(
clust$data, x = "pm10", y = "pm2.5", group = "cluster",
fill = c("dodgerblue", "tomato"), col = "white", pch = 21,
linear = TRUE
)A useful exercise is to try and do a better job of the scatter plot shown above using ggplot2. The plot might be clearer for example if each cluster plot was shown in its own pane.
Finally, in openair is it easy to plot the relationship between PM2.5 and PM10 and fit a linear regression model by using the inbuilt ‘type’ option where it is easy to consider the relationship by wind sector. Now it is possible to unpick things more. For easterly winds it seems that almost all the PM10 is in the form of PM2.5 (likely dominated by small sulphate / nitrate aerosol). The relationship is also strong for these wind sectors (as seen by the r2 value). For other wind sectors the relationships are less good and eh slope value lower.
scatterPlot(hc, x = "pm10", y = "pm2.5",
fill = "dodgerblue", col = "white", pch = 21,
type = "wd", linear = TRUE)